
How SPARK Samples Filecoin Deals
SPARK is checking whether public content stored on Filecoin can be retrieved. To do so, we need to find out which Filecoin deals store data that’s expected to be publicly available.
Filecoin was designed to store all kinds of data, but not all of it is meant to be publicly retrievable. For these “private data” deals, it’s up to the client and the Storage Provider to agree on how the client can access the stored data. Such an agreement happens off-chain.
On the other side of the spectrum is the community program called Filecoin Plus for Large Datasets, often abbreviated as FIL+ LDN. This program aims to incentivise the storage of public open datasets on Filecoin, such as measurements produced by scientific experiments. There is a clear expectation that content stored through FIL+ LDN should be readily retrievable on the network and this can be regularly verified (quoted from current scope in FIL+ LDN docs).
While FIL+ LDN does not cover all publicly retrievable data, it gives us a great start.
Listing active FIL+ LDN deals
How can we find all FIL+ LDN deals to choose some of them to check? There are three steps in this process:
- Get a list of all storage deals
- Filter active FIL+ deals
- Keep FIL+ LDN deals only
Get a list of all storage deals
Storage deals are managed by the built-in Storage Market Actor. The RPC API method Filecoin.StateMarketDeals returns a list of all deals created since the Filecoin Mainnet genesis. As you can imagine, it’s a lot of data - more than 20 GB in April 2024 - and the size is steadily growing as more deals are created over time. As a result, most RPC API providers have disabled access to this RPC method.
Fortunately, the awesome folks at Glif.io are creating hourly snapshots of StateMarketDeals data, the latest snapshot is publicly available via their Amazon S3 link.
In Spark, we use this snapshot as the data source of all storage deals.
Filter active FIL+ deals
The next step in our deal-processing pipeline is discarding all deals that are not active or that are not part of the FIL+ program. This is straightforward to implement using the following fields in the DealProposal objects from the Market Deals state:
Verifiedis a boolean field set totrueif the deal is part of FIL+.
StartEpochandEndEpochspecify the time interval when the deal is active.
Keep FIL+ LDN deals only
Lastly, we must filter the deals to keep only those made as part of the FIL+ LDN program. Theoretically, all data needed to construct such a filter is available in the on-chain state. In practice, it was easier to implement the following heuristics, which seem to work well.
First, we build a list of all clients that are verified for FIL+ LDN. We are using the following two endpoints offered by the public DataCapStats.io API:
const notaries = await findNotaries()
const allLdnClients = []
for (const notaryAddressId of notaries) {
const clients = await getVerifiedClientsOfNotary(notaryAddressId)
allLdnClients.push(...clients)
}
removeDuplicates(allLdnClients)
async function findNotaries (filter) {
const res = await fetch(
'https://api.datacapstats.io/public/api/getVerifiers?limit=1000',
{ headers: { 'X-API-KEY': API_KEY } }
)
const body = await res.json()
return body.data.map(obj => obj.addressId)
}
async function getVerifiedClientsOfNotary (notaryAddressId) {
const res = await fetch(
'https://api.datacapstats.io/public/api/getVerifiedClients/${notaryAddressId}?limit=1000',
{ headers: { 'X-API-KEY': API_KEY } }
)
const body = await res.json()
return body.data.map(obj => obj.addressId).filter(val => !!val)
}Second, to determine whether a deal is expected to be publicly retrievable, we check the Client field of the DealProposal. This field contains the address of the client making the deal. If the client is in the list of clients verified for FIL+ LDN, then we consider the deal to belong to the FIL+ LDN program and to have the expectation of public retrievability.
What’s next
This was the first post in the series explaining how SPARK checks retrievability. In next posts, we will explore how to find content identifiers (CIDs) of data stored in the deal and find the network address where to fetch the content from. Stay tuned!
More posts like this
Posts
| Title | Tags | Status | Date published | Written by | Featured | Top post | meta:author | meta:description | meta:image | meta:title |
|---|---|---|---|---|---|---|---|---|---|---|
| Filecoin Spark: Common Critiques | EngineeringSpark | Published | Patrick Woodhead | Patrick Woodhead | A Review of Current Common Critiques of Filecoin Spark | ![]() | Filecoin Spark: Common Critiques | |||
| Spark sees 10x improvement in Filecoin retrievability | ProductSpark | Published | Patrick Woodhead | Patrick Woodhead | On April 18th 2024, the overall Filecoin retrieval success rate (RSR), as measured by Spark, was 1.22%. Many Storage Providers were simply not serving retrievals. On the 25th September 2024, Spark measured the overall Filecoin RSR at 12.8%, a 10.5x improvement. | ![]() | Spark sees 10x improvement in Filecoin retrievability | |||
| How SPARK Retrieves Content From Filecoin | EngineeringSpark | Published | In the previous posts, we explained how SPARK samples Filecoin deals and how SPARK discovers content stored in those deals. In the final post of this series, we will explain how SPARK tests whether the content can be retrieved. | ![]() | How SPARK Retrieves From Filecoin | |||||
| Spark Roadmap H2 2024 | EngineeringSpark | Published | Patrick Woodhead | Patrick Woodhead | The roadmap for the Spark Protocol in H2 2024 | ![]() | Spark Roadmap H2 2024 | |||
| How SPARK Discovers Content Stored in FIL+ Deals | EngineeringSpark | Published | SPARK is checking whether public content stored on Filecoin can be retrieved. This post explains how SPARK Spark discovers the CIDs (content identifiers) of the data stored in Filecoin deals. | ![]() | How Spark Discovers Content Stored in FIL+ Deals | |||||
| How SPARK Samples Filecoin Deals | EngineeringSpark | Published | SPARK is checking whether public content stored on Filecoin can be retrieved. This post explains how SPARK samples Filecoin deals to find content that is expected to be publicly retrievable. | ![]() | How SPARK Samples Filecoin Deals | |||||
| Ethers v5 and Colossal Gas Overspending | Engineering | Published | The popular JavaScript library Ethers v5 can overpay FVM smart contract calls by 6000x. A single contract call can cost >2FIL instead of negligible 0.0003 FIL. | ![]() | Ethers v5 and Colossal Gas Overspending | |||||
| How we reduced memory usage by 90% | EngineeringSpark | Published | Four easy changes reduced the total memory usage of Spark’s Node.js backend from 4+ GB to ~360 MB: Convert plain data objects to class instances. De-duplicate immutable string values. Carefully choose how you calculate percentiles. Represent timestamps as numbers. | ![]() | How we reduced memory usage by 90% | |||||
| Optimising Performance of Spark's Postgres Database | EngineeringSpark | Published | Setting up observability for Postgres performance requires a bit of work, but it gives you valuable insights into the performance of your database. | Optimising Performance of Spark's Postgres Database |







